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Objectives: Efficient identification of subject experts or expert communities is vital for the growth of any organization. Most 
of the available expert finding systems are based on self-nomination, which can be biased, and are unable to rank experts. 
Thus, the objective of this work was to develop a robust and unbiased expert finding system which can quantitatively measure 
expertise. Methods: Medical Subject Headings (MeSH) is a controlled vocabulary developed by the National Library of Med- 
icine (NLM) for indexing research publications, articles and books. Using the MeSH terms associated with peer-reviewed ar- 
ticles published from India and indexed in PubMed, we developed a Web-based program which can be used to identify sub- 
ject experts and subjects associated with an expert. Results: We have extensively tested our system to identify experts from 
India in various subjects. The system provides a ranked list of experts where known experts rank at the top of the list. The 
system is general; since it uses information available with the PubMed, it can be implemented for any country Conclusions: 
The expert finding system is able to successfully identify subject experts in India. Our system is unique because it allows the 
quantification of subject expertise, thus enabling the ranking of experts. Our system is based on peer-reviewed information. 
Use of MeSH terms as subjects has standardized the subject terminology. The system matches requirements of an ideal expert 
finding system. 
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I. Introduction 

The ability to rapidly identify subject experts is essential 
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for the successful functioning of an organization. Some of 
the advantages of an expert finding system include 1) rapid 
formation of operational or proposal teams to accelerate 
research [1]; 2) identification of potential collaborators; 3) 
matching reviewers to submitted research proposals, manu- 
scripts and other peer-reviewed documents [2]; 4) identi- 
fication of expertise available within organizations [3]; 5) 
monitoring the research priorities of an organization; and 6) 
prediction of the effects of skill loss (attrition or retirement) 
or gain (merger or acquisition). 

Though useful, locating and evaluating subject experts is a 
difficult task because experts are rare and unevenly distrib- 
uted, while the requirements of expertise seekers are often 
poorly articulated. They often lack information about the 
past performance of experts, and it is difficult to classify and 
quantify expertise. Relocation of experts further complicates 
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the task of identifying subject experts. Finally, complex pro- 
posals/problems/issues require the combined wisdom of ex- 
perts from multiple subjects. Thus, there is a need to develop 
a computer-based system for finding experts. Developments 
in this field can reduce the time required for solving time- 
sensitive problems thus improving the overall efficiency of 
an organization. 

An ideal expert finding system should have the following 
qualities: 

• Should be robust — that is, it should be easily and cost ef- 
fectively updated without much user intervention. 

• Should be able to classify expertise into a standard subject 
classification schema. 

• Should provide information about the past performance 
of experts. 

• Should quantify expertise, enabling the ranking of experts. 

• Should be based on authentic sources of information. 

• Should be able to form expert communities. 

• Should be able to identify locally available experts. 

1. Available Expert Finding Systems 

Efforts have been made during the past 20 years to develop 
an expert finding system. A number of expert finding sys- 
tems have been developed at both national and international 
levels [3-6]. Broadly, the methods used for developing expert 
finding systems can be grouped into two categories. 

2. Methods Based on Mining Unstructured Information 

Unstructured information includes emails, corporate or 
personal Web pages, wiki, reports, etc. Text mining tools are 
used to index technical terms from unstructured documents, 
which can be queried to identify subject experts. Further, ex- 
perts have been ranked using the number of occurrences of 
technical terms [7]. Some of the important tools using this 
information include email expertise extraction (e3) system 
[8], ContactFinder [9], MIT Expert Finder [10], etc. 
Although this method is robust and useful, a major limita- 
tion is the authentication of information. Another issue is 
that, because of privacy, complete information may not be 
available for mining. Moreover, there is lack of implementa- 
tion of a standard subject terminology. 

3. Methods Based on Social Networking Sites or Contact 
Management Systems 

Today there are many social networking sites, some specifi- 
cally for the scientific community, such as ResearchGate 
(http://www.researchgate.net/), Nature Network (http:// 
network.nature.com/), VIVOweb, etc. Experts have to feed 
in information about their subject expertise, domains, pub- 
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lications, credentials, etc. A major limitation of these meth- 
ods is in the adding and updating of information. Many of 
the developed systems, particularly discussion forums and 
knowledge directories, have become obsolete due to de- 
crease in interest of experts. Experts may not be interested in 
subscribing to social sites or responding to queries. Further, 
there is difficulty in ranking experts. 

A common drawback to both approaches is that they are 
based on non-peer-reviewed information provided by the 
user; hence, they can be biased. 

It is evident from the above discussion that identifying cor- 
rect subject experts is extremely vital for the success of an 
organization. Most of the available expert finding systems 
are based on information provided by the individual or min- 
ing non-peer-reviewed information and can be biased as a 
result. For this reason, there is a strong need to develop an 
automated, unbiased expert finding system which is based 
on authentic information and can be easily updated. 

In this article we present an expert finding system that is 
based on peer-reviewed information, can be updated regu- 
larly, and uses standardized subject vocabulary i.e., Medical 
Subject Headings (MeSH) associated with each article [11]. 
Using MeSH headings adds standardized subjects for query- 
ing. The latest release of the system can be used to search for 
experts in a particular subject and the subjects associated 
with a particular expert from India. The methodology is 
general and can be implemented to identify subject experts 
from any country. 

II. Methods 

1. Data Retrieval 

PubMed is one of the largest repositories of peer-reviewed 
articles published worldwide. Publications originating from 
India (affiliation India) were downloaded in XML format 
using an in-house developed script. The developed script 
uses the Bio::Biblio module of Bioperl to interact with the 
PubMed database over the internet. 

The 2013 MeSH subjects were downloaded from the MeSH 
browser (http://www.nlm.nih.gov/mesh/filelist.html) in tree 
format. 

2. Data Pre-processing 

Each XML record of articles downloaded from PubMed 
was parsed using the XML::Twig module, and relevant fields 
(Authors, Title, Journal, Abstract, Volume, Issue, Page, and 
MeSH) were extracted in an intermediate text file where each 
record begins with a 'START' tag and ends with an 'END' tag. 
As we downloaded the MeSH as a text file, each MeSH 
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record was parsed on the basis of MeSH code using the in- 
house developed script, and the code for each left node (par- 
ent) and right node (child) were labeled. 

3. Database Design: PubMed Data 

A database consisting of two tables was designed to store 
information downloaded from PubMed, and their structure 
is shown in Table 1 . The minimum fields required for the ex- 
pert finding system were stored in the database. This reduces 
the size of the database and makes it suitable for developing 
cross platform standalone applications, such as Android or 
iOS apps. 

For storing MeSH terms, we adopted a nested set model, 
which is suitable for storing and querying tree data structure. 
The MeSH data was pre-processed to identify the left (parent) 
and right (child) node for each given node. The structure of 
the MeSH data table is shown in Table 2. 

The nested set is an efficient model for storing and search- 
ing through hierarchical data. The technique uses a method 
of storing metadata (left and right numbers) about the nodes 
(MeSH terms) contained in the tree in order to provide the 
SQL parser with information about how to "walk" the hier- 
archy of nodes. A critical aspect of the nested set model is 
that it alleviates the need for a recursive technique to find all 
children. 

The following rules were used to calculate left and right 
numbers: 

• For the root node in the hierarchy, the left side value will 
be 1, and the right side value will be 2*n, where n is the 
number of nodes in the tree. 

• For all other nodes, the right side value will be left side 
+ (2*n) + 1, where n is the total number of child nodes. 
For the leaf nodes (nodes without children), the right side 
value will always be equal to the left side value + 1 . 

• Left side value for any node is next free number, if we 
walk the tree counter-clockwise 

4. Expertise Scoring Function 

Our system uses a simple expertise scoring function, which 



Table 1. Structure of table containing PubMed records 



Field 


Type 


Null 


Key 


Default Extra 


pmid 


int(ll) 


No 


PRI 


Null 


pubyear 


fnt(ll) 


Yes 




Null 


mesh 


longtext 


Yes 




Null 


author 


longtext 


Yes 




Null 


affiliation 


longtext 


Yes 




Null 


citation 


longtext 


Yes 




Null 



Expert Finding System 

is the number of publications from a given expert containing 
a selected MeSH term. 

5. Statistical Significance of Subject Association 

The statistical significance of identified experts for a given 
subject was estimated using a Z-score calculated from a con- 
tingency table (Figure 1): 



where 




n 1 = (a + b) and n 2 = (c + d) 
(a + c) 

V = 7 ; rr and q = 1 — p 

p (a + b + c + d) w p 

a c 
Pi = ~, rr and p 2 = ~, rr ■ 

6. Web Interface 

A Web interface is developed using PHP for finding experts 
using a browser (Figure 2). 

III. Results 

The objective of this study was to develop an unbiased and 
robust expert finding system using peer-reviewed informa- 
tion. We developed a Web-based system to find subject 
experts using MeSH associated with peer-reviewed articles 
indexed in PubMed. The system quantitatively ranks experts 
using MeSH terms associated with their peer-reviewed pub- 
lications indexed in PubMed. 

It is difficult to evaluate such a system as there are no 
benchmarks available; therefore, we evaluated the efficacy of 
the developed system using prior information about experts 
in subjects. The further statistical significance of the associa- 
tion between a subject and an expert was estimated using 



Table 2. Structure of Medical Subject Headings table for storing 
hierarchical data 



Field 


Type 


Null 


Key 


Default Extra 


category_id 


int(ll) 


No 


PRI 


0 


name 


text 


Yes 


MUL 


Null 


left_side 


int(ll) 


Yes 




Null 


right_side 


int(ll) 


Yes 




Null 
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Given author 


Excluding given author 




Given subject 


a 


b 


a+b (number of 
articles in given subject) 


Excluding given subject 


c 


d 


c+d (number of articles 
in subjects other than 
given subject 




a+c (number of 
articles from 
given author) 


b+d (number of articles 
from authors other than 
given author) 





Figure 1. Contingency table used to 
calculate statistical sig- 
nificance of association 
between given subject and 
expert. 



ABOUT 



SEARCH 



HELP 



TEAM 



Expert Finding System 

Using Medical Subject Heading (MeSH) 



Jjfk Find Experts 

Looking for Experts This module can help 
you lo rind the Top 20 Experts related to 
Medical MeSH Term In additional you can 
view all Publications Subject related 
Publications Co-Author Analysis and all 
Affiliations of the Experts yearwise 



Subjects Associated with 
Expert 



Looking for Subjects reated to Author This 
module can help you to find the Subjects 
Associated with the Expert along with the 
number of Publications the Expert has in that 
Subjects 



Subject Associated with 
Text 



Figure 2. Home page of Expert Finding 
System available at (http:// 
bmi.icmr.org. in/expert; 
http://202. 1 41.106.1 22/ex- 
pert). 



Table 3. Experts from selected subjects along with their p-value for subject association calculated using Z-test 



No. 


Subject 


Expert 


p-value 


Comment 


1 


Microbiology 


T. Ramamurthy 


<0.0001 


Dr. Ramamurthy has nearly 70% publications in microbiology 


2 


Computational biology Gajendra P. S. Raghava 


<0.0001 


Dr. Raghava is Bhatnagar awardee and a known international 










figure in bioinformatics 


3 


X-ray crystallography 


M. Vijayan 


<0.0001 


With nearly 75% papers in the subject, Prof. M. Vijyan is 
known crystallographer 


4 


Database, proteins 


Gajendra P. S. Raghava 


<0.0001 


Has developed nearly 70 Web services and databases 


5 


Genetics 


Lalji Singh 


<0.0001 


Prof. Lalji Singh has worked extensively in genetics with more 
than 90% papers in the subject 



Z-score (see Methods Section). 

Table 3 shows the statistical significance of identified ex- 
perts from known branches of sciences, such as 'Microbiol- 
ogy', 'X-Ray crystallography', 'Database', 'Genetics'. The iden- 
tified experts are well known international experts on the 
subjects. 

1. Description of the System 

Below is the brief description of our system. Release 2.1 of 



our expert finding system allows users to find (1) experts 
from a particular subject and (2) subjects associated with a 
given expert. 

1) Application: Finding experts in a given subject 
To locate a subject expert the user has to enter either partial 
or complete name of the subject in the 'Subject' text box 
and click 'Submit' (Figure 3). On submission MeSH subjects 
similar to the query are displayed as a drop-down box (Fig- 
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Expert Finding System 



ure 4). The user then selects the most relevant subject from 
the MeSH drop-down and clicks 'Submit'. A list of experts 
ranked on the basis of the number of articles published in a 
selected subject is displayed (Figure 5). The user can access 
a year-wise list of all affiliations of author, all co-authors, 
MeSH subjects and all publications. The publications are 
linked to PubMed using pmid. 

To search the MeSH terms, we selected the entire subtree of 
the selected MeSH term. For example to find experts in mi- 
crobiology, the entire subtrees, which include bacteriology, 
virology, etc., were also searched. 

2) Application: Finding subjects associated with an expert 
To locate subjects associated with an expert, the user enters 
either partial or complete name of the expert in the 'Last 
name, First name or initials' text box and clicks 'Submit'. On 
submission, the names of experts that are similar to the que- 
ry are displayed as a drop-down box. The user then selects 
an expert from the drop-down box and clicks 'Submit'. A 
list of subjects ranked on the basis of the number of articles 



HOME 


ABOUT 


SEARCH 


HELP 


TEAM 


CONTACT US 






published in the subject is displayed (Figure 6). The user can 
access all publications of an expert or all of their publications 
in a given subject; they can also search for other experts in a 
given subject. 

IV. Discussion 

The expert finding system developed using MeSH terms has 
been tested using known subject experts from India and 
was found to be satisfactory. For example in the subject of 
crystallography Prof. M. Vijayan, Prof. T. P. Singh, Prof. A. 
Srinivasan, and Prof. M. R. Murthy are ranked as top ex- 
perts. They are all well-known experts in crystallography in 
India. Similarly, the system was successfully tested for other 
subjects. 

However, the system has some limitations, some of which 
will be addressed in future releases of the system. One major 



HOME 


ABOUT 


SEARCH 


HELP 


TEAM 


CONTACT US 





Enter the Medical Term: 

microbiology 



! Please Select the Mesh Term 



Microbiology 
, Bacteriology 

Environmental Microbiology 
■ Air Microbiology 
I Food Microbiology 
I Soil Microbiology 
I Water Microbiology 
1 Genetics Microbial 

, Industnal Microbiology 

I Mycology 

Plant Palhntnnv 




Figure 3. Finding subject experts: enter complete or partial subject. 



Figure 4. Selecting from list of Medical Subject Headings sub- 
jects related to entered text. 



HOME ABOUT SEARCH HELP TEAM CONTACT US 



Top 20 Expert's of Microbiology 



AUTHOR 


NUMBER OF 


PERCENTAGE OF 


DETAILS 




PUBLICATIONS 


PUBLICATIONS 





S Sharma 
p<0 0001 



T Ramamurlhy 
p<0 0001 



A Kumar 
p=03127 



PUBLICATIONS DETAILS 



PUBLICATIONS DETAILS 



PUBLICATIONS DETAILS 



View Publication Relaled to 
Microbiology by S Sharma 

View All Publication by S Sharma 



View Co-Author Analysis of S 
Sharma 



til Affiliation of S Sharma 



Figure 5. List of subject experts in 
'microbiology'. 
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HOME *30UT SEARCH -EL 3 TEW CO'.TICTUS 



Mesh Terms by gps raghava 



■ 



Mesh Term Number of Publication 





77 




44 




39 




27 


Internet 


21 


cfassifcanon 


20 




20 




15 


Software 


IS 


Databases Protem 


18 


TT - '^3 / 


18 


Artificial Intekgence 


10 


B rr ; 


IS 


DompiMaMI Btohajy 


13 


St-n.er Ma^MaM 


12 


Neural Networks (Computer) 


12 


Amino Ac«3 Sequenoe 


12 




11 


Prz-.er- BIMelM BMOSaaaT] 


11 




10 



limitation of using MeSH terms associated with an article 
as expertise of all the authors of that paper is that some of 
the experts may have different expertise and may have been 
merely co-author in the paper. For example, in any biomedi- 
cal work, statistical assistance is required, and any statistical 
expert may be associated with biomedical subject headings. 
However, given the large number of articles from which the 
data is collected, the probability of association of an expert 
with related subjects is higher than unrelated subjects. In our 
experience, any subject in which an author has more than 20 
publications can be considered to be associated with author. 
To address this issue, we calculated two parameters: 

(i) The percentage of contribution to the field which is cal- 
culated as 

n s 

percent contribution to the field (pc) = — , 

where n s is the number of publications of a given author 
in the selected subject, and N is the total number of 
publications by that author. 

(ii) A statistical association test, based on the Z-score as de- 
scribed in the Methods Section. 



Figure 6. Searching subjects associ- 
ated with an expert. 

We found correlation between the percentage of contribu- 
tion to the field and the p-value calculated from the Z-score. 
However, both pc and Z-score were not correlated with the 
number of publications. 

Another possible limitation is that many experts publish in 
journals that are not indexed in PubMed, and since we have 
used data extracted from PubMed, the ranking of subject 
experts may be incomplete. There are some Web services 
available for assigning MeSH terms to articles [12]. The inte- 
gration of non-indexed journals will be done after verifying 
the credibility of sources. 

The developed system is unique as it uses a purely objective 
measure to identify subject experts, the data used is peer- 
reviewed and reliable, and it allows the ranking of experts. 
For developing countries where resources for research are 
limited, developing such a system can improve the efficiency 
of research. 

We are trying to improve the system by incorporating 1) 
graphical representation of co-author networks, 2) analysis 
tools for co-author networks, 3) clustering of affiliations to 
identify most probable affiliations, and 4) geographical map- 
ping of expertise. 
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